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Legged Robot League 


e Typical scenario: pre-coordination 


— People practice together 
— Robots given coordination languages, protocols 
— "Locker room agreement" (Stone & Veloso, '99) 
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Ad Hoc Teams 


e Ad hoc team player is an individual 
— Unknown teammates (programmed by others) 
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e Ad hoc team player is an individual 
— Unknown teammates (programmed by others) 


e May or may not be able to communicate 


e Teammates likely sub-optimal: no control 


Challenge: Create a good team player 
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May Have Different Capabilities 
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And/Or Maneuverability 
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May be a Previously Unknown Type 
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Human Ad Hoc Teams 


e Military and industrial settings 
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Human Ad Hoc Teams 


e Military and industrial settings 
— Outsourcing 
e Agents support human ad hoc team formation 
(Just et al., 2004; Kildare, 2004) 


e Autonomous agents (robots) deployed for short times 


— Teams developed as cohesive groups 
— Juned to interact well together 
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Challenge Statement 


Create an autonomous agent that is able to efficiently 
and robusily collaborate with previously unknown 


teammates on tasks to which they are all individually 
capable of contributing as team members. 
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Evaluation: A Metric 
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Evaluation: A Metric 
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e Most meaningful when a0 and al have similar individual 
competencies 
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Evaluation: Domain Consisting of Tasks 
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Evaluation: Set of Possible Teammates 
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Evaluation: Draw a Random Task 
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Evaluation: Random Team, Check Comp 
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Evalution: Replace Random with a0 
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Evaluation: Then al — Evaluate Diff 
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Evaluation: Repeat 
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Evaluate(ao, a1, A, D) 


e Initialize performance (reward) counters ry and r; for agents ay and 
a, respectively To ro = rı = 0. 


e Repeat: 


- Sample a task d from D 

Randomly draw a subset of agents B, |B| > 2, from A such that 
Efs(B, d)| = Smin: 

Randomly select one agent b € B to remove from the team to 
create the team B-. 

increment ro by s({ao} U BT,d) 

increment rı by s({a,}U BT,d) 


e |f ro > rı then we conclude that ao is a better ad-hoc team player 
than a; in domain D over the set of possible teammates A. 
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Technical Requirements 


e Assess capabilities of other agents (teammate modeling) 
e Assess the other agents’ knowledge states 
e Estimate effects of actions on teammates 


e Be prepared fo interact with many types of teammates: 


— May or may not be able fo communicate 
— May be more or less mobile 
— May be better or worse at sensing 


A good team player's best actions will differ 
depending on its teammates' characteristics. 
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Preliminary Theoretical Progress 


e Aspects can be approached theoretically 


e Ultimately an empirical challenge 
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Preliminary Theoretical Progress 


e Aspects can be approached theoretically 


e Ultimately an empirical challenge 
Be prepared to interact with many types of teammates 


e Minimal representative scenarios 


— One teammate, no communication 
— Fixed and known behavior 
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Scenarios 


e Cooperative iterated normal form game 
(w/ Kaminka & Rosenschein—AMEC 09) 
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e Cooperative k-armed bandit (w/ Kraus—AAMAS’ 10) 
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e Cooperative normal form game 
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ao |25 1 0 
aj: |10 30 10 
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e Cooperative k-armed bandit 
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3-armed bandit 


e Random value from a distribution 
e Expected value pu 
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3-armed bandit 
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e Agent A: teacher 
— Knows payoff distributions 
— Objective: maximize expected sum of payoffs 
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Arm, 


Hx p 


e Agent A: teacher 
— Knows payoff distributions 
— Objective: maximize expected sum of payoffs 
— If alone, always Arm, 


e Agent B: learner 


— Can only pull Arm, or Arm» 
— Selects arm with highest observed sample average 
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Hx = 
e Alternate actions (Teacher first) 
e Results of all actions fully observable (fo both) 
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e Optimal solution when arms have discrete distribution 
e Interesting patterns in optimal action 
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Challenge Statement 


Create an autonomous agent that is able to efficiently and 
robustly collaborate with previously unknown teammates on 


tasks to which they are all individually capable of contributing 
as team members. 


| 
Lea ning Agents Research Gi oup © 2010 Peter Stone 


Suggested Research Plan 


1. Identify the full range of possible teamwork situations that a complete 
ad hoc team player needs to be capable of addressing (D and A). 


| 
Lea ning Agents Research Gi oup © 2010 Peter Stone 


suggested Research Plan 


1. Identify the full range of possible teamwork situations that a complete 
ad hoc team player needs to be capable of addressing (D and A). 


2. For each such situation, find theoretically optimal and/or empirically 
effective algorithms for behavior. 


| 
Lea ning Agents Research Gi oup © 2010 Peter Stone 


suggested Research Plan 


1. Identify the full range of possible teamwork situations that a complete 
ad hoc team player needs to be capable of addressing (D and A). 


2. For each such situation, find theoretically optimal and/or empirically 
effective algorithms for behavior. 


3. Develop methods for identifying which type of teamwork situation the 
agent is currently in, in an online fashion. 


| 
Lea ning Agents Research Gi oup © 2010 Peter Stone 


suggested Research Plan 


1. Identify the full range of possible teamwork situations that a complete 
ad hoc team player needs to be capable of addressing (D and A). 


2. For each such situation, find theoretically optimal and/or empirically 
effective algorithms for behavior. 


3. Develop methods for identifying which type of teamwork situation the 
agent is currently in, in an online fashion. 


e 2 and 3: fhe core technical challenges 


| 
Lea ning Agents Research Gi oup © 2010 Peter Stone 


suggested Research Plan 


1. Identify the full range of possible teamwork situations that a complete 
ad hoc team player needs to be capable of addressing (D and A). 


2. For each such situation, find theoretically optimal and/or empirically 
effective algorithms for behavior. 


3. Develop methods for identifying which type of teamwork situation the 
agent is currently in, in an online fashion. 


e 2 and 3: the core technical challenges 


e | : a knob fo incrementally increase difficulty 
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Much More pertaining to specific teammate characteristics 
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